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AUTOMATED VOICE PATTERN FILTER 



BACKGROUND OF THE INVENTION 

1 . FIELD OF THE INVENTION 

The present invention generally relates to Automated Voice Pattern 
("AVP") methods and devices. The present invention particularly relates to AVP 
methods and devices for providing a client-based voice pattern data packet for 
improving speech recognition performance. 

2. DESCRIPTION OF THE RELATED ART 

An Automated Speech Recognition ("ASR") platform as known in the art is 
designed to respond to a reception of a transmitted speech signal (e.g., voice 
commands) from a transceiver (e.g., mobile phones, embedded car phones, and 
phone enabled personal data assistants) with an audio signal that corresponds 
to the context of the transmitted speech signal. However, a performance of a 
prior art ASR platform can be adversely affected by any signal degradation of the 
transmitted speech signal (e.g., acoustical coupling and signal distortion) along a 
transmission signal path from a user of the transceiver to the ASR platform. The 
performance can also be adversely affected by variations in the voice pattern 
characteristics between different users of a transceiver. 

Signal degradation of the transmitted speech signal has been addressed 
by the invention of a pre-ASR filter. The differences in voice patterns between 
individual users of the transceiver is addressed by the present invention. 



SUMMARY OF THE INVENTION 

The present invention relates to an automated voice pattern filter that 
overcomes the aforementioned disadvantages of the prior art. Various aspects 
5 of the invention are novel, non-obvious, and provide various advantages. While 
the actual nature of the present invention covered herein can only be determined 
with reference to the claims appended hereto, certain features, which are 
characteristic of the embodiments disclosed herein, are described briefly as 
follows. 

10 One form of the present invention is an automated voice pattern filtering 

method implemented in a system having a client side and a server side. At the 
client side, a speech signal is transformed into a first set of spectral parameters 
which are encoded into a set of spectral shapes that are compared to a second 
set of spectral parameters corresponding to one or more keywords. From the 
1 5 comparison, the client side determines if the speech signal is acceptable. If so, 
spectral information indicative of a difference in a voice pattern between the 
3 speech signal and the keyword(s) is encoded and utilized as a basis to generate 

=^ a voice pattern filter. 

^ The foregoing form, and other forms, features and advantages of the 

y 20 invention will become further apparent from the following detailed description of 
the presently preferred embodiments, read in conjunction with the accompanying 
drawings. The detailed description and drawings are merely illustrative of the 
invention rather than limiting, the scope of the invention being defined by the 
appended claims and equivalents thereof. 

25 
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BRIEF DESCRIPTION OF THE DRAWINGS 

FIG. 1 is an illustration of a hands-free, in-vehicle environment in 
accordance with the present invention; 
5 FIG. 2 is a block diagrann of one embodiment of a transceiver and a 

filtering system during an initialization of a voice pattern filter in accordance with 
the present invention; 

FIG. 3 is a block diagram of one embodiment of a voice pattern 
recognition system in accordance with the present invention; 
10 FIG. 4 is an illustration of one embodiment of a voice data packet in 

accordance with the present invention; and 

FIG. 5 is a block diagram of one embodiment of a filtering system during 
an operation of a voice pattern filter In accordance with the present invention. 

15 DETAILED DESCRIPTION OF THE 

PRESENTLY PREFERRED EMBODIMENTS 

FIG. 1 represents a signal path during a time involving the transmissions 
and receptions of various voice signals between a client side and a sen/er side of 
the system. Specifically, FIG. 1 illustrates a hands-free, in-vehicle environment 

20 containing a conventional vehicle 10 on the client side of the system, a 

conventional wireless network 30, a conventional wireline network 40, a new and 
unique filtering system 50 on the server side of the system, and a conventional 
ASR platform 60 on the server side of the system. A user 1 1 of a transceiver in 
the form of a mobile phone 20 is seated within vehicle 10. In other embodiments 

25 of the present invention, the transceiver can be in the form of an embedded car 
phone, a phone enabled personal data assistant, and any other transceiver for 
transmitting and receiving a phone call. 
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A more detailed explanation of the invention will now be provided herein. 
Those having ordinary skill in the art will appreciate that the various described 
signals are based upon a discrete tinne instant k and the various described filters 
5 are based upon a discrete time, frequency domain operator z. Specifically, the 
operator z is used to represent the frequency response characteristics of the 
filters and the models described herein. 

Mobile phone 20 conventionally transmits a transmission signal Ti[k] to 
wireless neiwork 30 in response to user 1 1 articulating a speech signal Ui[k] in a 
10 direction of a microphone (not shown) of mobile phone 20. Speech signal Ui[k] 
is a main component of transmission signal Ti[k], A noise signal Ni[k] consisting 
;EJ of noise emanating from various sources of vehicle 10 (e.g., an engine, a 

O heater/air conditioner, a radio, and a pair of wiper blades) are also components 

SI of transmission signal Ti[k]. In addition, an audio signal (not shown) being an 

^ 1 5 acoustically coupled form of an audio signal RaM is a component of transmission 
W signal Ti[k], Transmission signal Ti[k] therefore ranges from a slightly distorted 

Q version of speech signal Ui[k] to a significantly distorted version of speech signal 

L Ui[k] as a function of an intensity of the vehicle noise signal Ni[k] and an 

W intensity of audio signal (not shown) generated by mobile phone 20, wireless 

m 20 network 30, and wireline network 40. Wireless network 30 (e.g., an advanced 
mobile phone service, a time division multiple access network, a code division 
multiple access network, and a global system for mobile communications) 
conventionally transmits a transmission signal T2[k] to wireline network 40 in 
response to a reception of transmission signal Ti[k] by wireless network 30. The 
25 conventional transmission of transmission signal T2[k] involves a degree of signal 
distortion and a degree of signal attenuation of transmission signal Ti[k] by 
wireless network 30. Transmission signal T2[k] therefore ranges from a slightly 
distorted version of transmission signal Ti[k] to a significantly distorted version of 
transmission signal Ti[k] as a function of an intensity of the signal distortion and 
30 an intensity of the signal attenuation by wireless network 30 upon transmission 
signal Ti[k]. 
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Wireline network 40 (e.g., a Public Switched Telephone Network, and 
VOIP network) conventionally transmits a transmission signal T3[k] to Filtering 
system 50 in response to a reception of transmission signal T2[k] by wireline 
5 network 40. The conventional transmission of transmission signal T3[k] involves 
a degree of signal distortion and a degree of signal attenuation of transmission 
signal T2[k] by wireline network 40. Transmission signal TsLk] therefore ranges 
from a slightly distorted version of transmission signal T2[k] to a significantly 
distorted version of transmission signal T2[k] as a function of an intensity of the 
1 0 signal distortion and an intensity of the signal attenuation by wireline network 40 
upon transmission signal T2[k]. 
t: As shown in FIG. 5, filtering system 50 includes a voice pattern filter 52 

lJ 

Q and an ASR filtering device 53 to transmits a speech signal U2[k] to ASR 

Tl platfomri 60 (e.g., a computer platform employing commercially available speech 

"si 

ro 1 5 recognition software from Nuance of Menio Park, California or SpeechWorks of 
m Boston, Massachusetts) in response to a reception of transmission signal T3[k] 

% and audio signal Ri[k] by filtering system 50. Tine unique transmission of speech 

p signal U2[k] by filtering system 50 involves two important aspects. First, voice 

m pattern filter 52 provides a speech signal T4[k] to ASR filtering device 53 in 

20 response to transmission signal T3[KI whereby a voice pattern characteristic of 
user 1 1 is ascertained to thereby enhance the voice recognition capability of 
ASR platform 60. 

Second, as described in U.S. Patent Application Serial No. (to be filled in later) 
entitled "Automated Speech Recognition Filter", the entirety of which is 

25 incorporated herein by reference, the ASR filtering device 53 utilizes profile 
based characteristics of vehicle 1 0, mobile phone 20, wireless network 30, and 
wireline network 40 as well as a utilization of real-time signal characteristics of 
transmission signal T4[k], audio signal Ri[k], and an estimate of vehicle noise 
signal Ni[k]. The result is a transmission of speech signal U2[k] by filtering 

30 system 50 to ASR platform 60 as an approximation of speech signal Ui[k]. An 
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improved performance of ASR platform 60 is therefore facilitated by a reception 
of speech signal U2[k] by ASR platform 60. 

FIG. 2 represents the data transmission path that is necessary to transmit 
5 a data packet DP to the server side of the system. Specifically, FIG. 2 illustrates 
a generation of voice pattern filter 52. First, the user 1 1 articulates a speech 
signal Ui[k] including one or more pre-specified keywords Wp (1 p P) 
whereby a voice pattem recognition module 21 receives a speech signal U3[k] 
resulting from the summation of speech signal Ui[k], noise signal Ni[k], and an 
10 audio signal (not shown) being an acoustically coupled fomn of audio signal 

Rafk]. In response thereto, voice pattern recognition module 21 provides a data 
H packet DP via wireless network 30 and wireline network 40 to filtering system 50 

5 when the frequency characteristics of the speech signal Ui[k] as represented by 

J the spectral vector Vp are acceptable when compared to its corresponding 

W 15 keyword Wp. In response to data packet DP, a linear interpolator 51 
5 conventionally establishes an input for voice pattem filter 52. Conversely, the 

voice pattern recognition module 21 provides a rejection message RM to user 1 1 
via a speaker of mobile phone 20 when the frequency characteristics of the 
speech signal Ui[k] as represented by the spectral vector Vp are unacceptable. 
20 FIG. 3 illustrates one embodiment of voice pattern recognition module 21 

for ascertaining the acceptability of the spectral vector Vp. A preprocessor 22 
receives speech signal Uafk] and in response thereto, provides a set of pole-zero 
coefficients {a-,, Ui}. In one embodiment, a Linear Prediction Model (LPM) is used 
to represent the speech signal U3[k] in accordance with the following equation 
25 [1]: 



L 

U3[k]=Y.''i^2[k-i] + e[k] [1] 
i=l 



lU 
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Equation [1] uses the assumption that the speech signal UsM is a linear 
combination of L previous samples. The ai coefficients are the resulting 
predictor coefficients, which are chose to minimize a mean square filter 
5 prediction error signal e[k] summed over the analysis window. The preprocessor 
22 transforms the speech signal U3[k] into a representation of a corresponding 
spectral signal U3(z). The transformed pole-zero transfer function is computed in 
accordance with the following equation [2]: 



10 U3(z) = ^^ [2] 



with the assumption that spectral signal U3(z) is minimum phase. 

A feature extractor 23 receives pole-zero coefficients {ai, Ui}, and in 
response thereto, provides a set of cepstral coefficients C(n) representative of a 
15 spectral parameters corresponding to speech signal U3[k]. In one embodiment, 
feature extractor 23 computes the cepstral coefficients C(n) in accordance with 
the following equation [3]: 



Cin)=^±a:-^±u: [3] 



20 



A vector quantization codebook 24 receives cepstral coefficients C(n), and 
in response thereto, conventionally provides spectral vector Vp. In one 
embodiment, vector quantization codebook 24 conventionally transforms the 
cepstral coefficients C(n) to the spectral vector Vp. 



25 
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A vector classifier 26 receives tine spectral vector Vp as well as keyword 
Wp from a keywords module 25. it is assumed that the dimension of the spectral 
vector Vp and keyword Wp is m. In response thereto, the vector classifier 26 
5 provides either the data packet DP or the rejection message RM. In one 
embodiment, the vector classifier 26 first computes an index p* in accordance 
with the following equation [4]: 

p* = arg min d{Vn,Wp) m 
1<p<P 

10 

where d is a smallest distance between spectral vector Vp and keyword 

Wp. 

Next, the vector classifier 26 ascertains whether the d(Vp,Wp)\s less than 
a threshold T. If so, the vector classifier 26 provides data packet DP. Othenwise, 

1 5 the vector classifier 26 provides reject message RM. In one embodiment, the 
data packet DP includes at least a packet header 70, and a set of voice pattern 
bytes 71 having m bytes of spectral information A = [Ai, A2, Am] which 
represents the average spectral difference between spectral vector Vp and 
corresponding keyword Wp. The purpose of the linear interpolator 51 is to 

20 transform a discrete spectral information A = [Ai, A2, Am] into a continuous 
frequency spectrum A(z) employed by voice pattern filter 52, which captures the 
spectral difference between the speech signal U3[k] and keyword Wp. With voice 
pattern filter 52, the perfonnance of ASR platform 60 can be improved by 
accounting for the spectral difference between individual speakers. 

25 Voice pattern module 21 (FIGS. 2 and 3) may consist of hardware digital 

and/or analog), software, or a combination of hardware and software. Those 
having ordinary skill in the art will appreciate a sequential operation of the 
components of voice pattem module 21 (e.g., in a software implementation) and 
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a concurrent operation of each component of the voice pattern module 21 (e.g., 
in a hardware implementation). In alternative embodiments, voice pattern 
module 21 may be alternatively incorporated within wireless network 30 (FIG. 2), 
5 wireline network 40 (FIG. 2), and filtering system (50), or distributed among 
transceiver 20, wireless network 30, wireline network 40 and/or filtering system 
50. 

Voice pattern filter 52 (FIGS. 2 and 5) may consist of hardware digital 
and/or analog), software, or a combination of hardware and software. In 
10 alternative embodiments, voice pattern filter 52 may be alternatively incorporated 
within transceiver 20, wireless network 30 (FIG. 2), and wireline network 40 (FIG. 

O 2), or distributed among transceiver 20, wireless network 30, wireline network 40 

Q 

^ and/or filtering system 50. 

Filtering system 50 has been described herein as a pre-filtering system in 
w 1 5 electrical communication with ASR platform 60 (FIG. 1 ). In alternative 
i= embodiments of the present invention, filtering system 50 may be incorporated 

H into ASR platform 60. 

H Filtering system 50 has also been described herein in the context of an 

lH 

□ employment within a telecommunication system having a transceiver situated 

20 within a vehicle. In altemative embodiments of the present invention, filtering 
system 50 may be employed within various other systems used for audio 
communication purposes such as, for example, a video conferencing system, 
and the transceivers of such systems can be situated within the system as would 
occur to those having ordinary skill in the art. 
25 While the embodiments of the present invention disclosed herein are 

presently considered to be prefen^ed, various changes and modifications can be 
made without departing from the spirit and scope of the invention. The scope of 
the invention is indicated in the appended claims, and all changes that come 
within the meaning and range of equivalents are intended to be embraced 
30 therein. 
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